Skip to content

Reduce memory usage when searching for commits and issues#159

Closed
marcuscaisey wants to merge 1 commit intogharlan:mainfrom
marcuscaisey:search-memory
Closed

Reduce memory usage when searching for commits and issues#159
marcuscaisey wants to merge 1 commit intogharlan:mainfrom
marcuscaisey:search-memory

Conversation

@marcuscaisey
Copy link
Copy Markdown
Contributor

@marcuscaisey marcuscaisey commented May 8, 2026

Problem

When searching for commits or issues in a large repository, the gh script filter can exhaust all of its available memory and crash.

For example, the query gh neovim/neovim *72cf89bce8 (specific commit chosen because it's the initial one) results in the following output in the debugger:

[14:30:21.357] ERROR: GitHub[Script Filter] Code 255: loading content for https://api.github.com/repos/gharlan/alfred-github-workflow/releases/latest
loading content for https://api.github.com/user?per_page=100
loading content for https://api.github.com/repos/neovim/neovim/commits?per_page=100PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 405504 bytes) in /Users/marcus/scratch/alfred-github-workflow/workflow.php on line 240
PHP Stack trace:
PHP   1. {main}() /Users/marcus/Library/Caches/com.runningwithcrayons.Alfred/Workflow Scripts/A405D906-43C3-468F-BF09-54D4D9D31796:0
PHP   2. Search::run($scope = 'github', $query = ' neovim/neovim *72cf89bce8', $hotkey = '0') /Users/marcus/Library/Caches/com.runningwithcrayons.Alfred/Workflow Scripts/A405D906-43C3-468F-BF09-54D4D9D31796:5
PHP   3. Search::addRepoSubCommands() /Users/marcus/scratch/alfred-github-workflow/search.php:80
PHP   4. Workflow::requestApi($url = '/repos/neovim/neovim/commits', $curl = *uninitialized*, $callback = *uninitialized*, $firstPageOnly = *uninitialized*, $maxAge = *uninitialized*) /Users/marcus/scratch/alfred-github-workflow/search.php:355
PHP   5. Workflow::requestCache($url = 'https://api.github.com/repos/neovim/neovim/commits?per_page=100', $curl = NULL, $callback = NULL, $firstPageOnly = FALSE, $maxAge = 10, $refreshInBackground = *uninitialized*) /Users/marcus/scratch/alfred-github-workflow/workflow.php:303
PHP   6. Curl->execute() /Users/marcus/scratch/alfred-github-workflow/workflow.php:293
PHP   7. Workflow::{closure:/Users/marcus/scratch/alfred-github-workflow/workflow.php:256-258}($response = class CurlResponse { public $request = class CurlRequest { public $url = 'https://api.github.com/repositories/16408992/commits?per_page=100&page=93'; public $etag = NULL; public $token = '****************************************'; public $callback = class Closure { ... } }; public $status = 200; public $contentType = 'application/json; charset=utf-8'; public $etag = 'W/"f85dd32fb287dfdb16b50304ee27c01f83ea42bd3c12f86c08b154d73f44b1a3"'; public $link = '<https://api.github.com/repositories/16408992/commits?per_page=100&page=94>; rel="next", <https://api.github.com/repositories/16408992/commits?per_page=100&page=365>; rel="last", <https://api.github.com/repositories/16408992/commits?per_page=100&page=1>; rel="first", <https://api.github.com/repositories/16408992/commits?per_page=100&page=92>; rel="prev"'; public $content = ... }) /Users/marcus/scratch/alfred-github-workflow/curl.php:66
PHP   8. Workflow::{closure:/Users/marcus/scratch/alfred-github-workflow/workflow.php:224-285}($response = class CurlResponse { public $request = class CurlRequest { public $url = 'https://api.github.com/repositories/16408992/commits?per_page=100&page=93'; public $etag = NULL; public $token = '****************************************'; public $callback = class Closure { ... } }; public $status = 200; public $contentType = 'application/json; charset=utf-8'; public $etag = 'W/"f85dd32fb287dfdb16b50304ee27c01f83ea42bd3c12f86c08b154d73f44b1a3"'; public $link = '<https://api.github.com/repositories/16408992/commits?per_page=100&page=94>; rel="next", <https://api.github.com/repositories/16408992/commits?per_page=100&page=365>; rel="last", <https://api.github.com/repositories/16408992/commits?per_page=100&page=1>; rel="first", <https://api.github.com/repositories/16408992/commits?per_page=100&page=92>; rel="prev"'; public $content = ... }, $content = NULL, $parent = 'https://api.github.com/repositories/16408992/commits?per_page=100&page=92') /Users/marcus/scratch/alfred-github-workflow/workflow.php:257
PHP   9. json_encode($value = ...) /Users/marcus/scratch/alfred-github-workflow/workflow.php:240

I wrote a small script which performs the above query:

<?php
require 'search.php';
Search::run('github', ' neovim/neovim *72cf89bce8', getenv('hotkey'));
echo Workflow::getItemsAsXml();

and profiled it using https://github.com/arnaud-lb/php-memory-profiler:

MEMPROF_PROFILE=dump_on_limit php test.php

The resulting profile surfaced json_decode as the biggest offender:
image

The issue is that in Workflow::requestCache, we store all of the responses from the API in an array:

$responses[] = $response->content;

So for a large repository with lots of commits, the size of this array can outgrow the default memory limit of 128MB.

Solution

  • Add an optional $transformItem parameter to Workflow::requestCache and Workflow::requestApi. When provided, $transformItem is called to transform each item returned from the API into another form.
  • Construct the Item object for each listed commit and issue using $transformItem. By doing so, we can throw away the large response object that we were previously storing in $responses and only store the data that we need.

After these changes, the gh script filter no longer crashes on the input gh neovim/neovim *72cf89bce8.

Effectiveness

To understand the effectiveness of this solution, i've written a slightly modified version of the above test script which sets the memory limit to 1GB so that it won't crash:

<?php
ini_set('memory_limit', '1G');
require 'search.php';
Search::run('github', ' neovim/neovim *72cf89bce8', getenv('hotkey'));

and measured the memory usage using

/usr/bin/time -l php test.php

Before

        1.41 real         1.14 user         0.25 sys
           529268736  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               33559  page reclaims
                  91  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 493  involuntary context switches
         15237059400  instructions retired
          4447079530  cycles elapsed
           511689544  peak memory footprint

After

        1.23 real         1.04 user         0.18 sys
            70352896  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                6509  page reclaims
                 188  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                  87  involuntary context switches
         15096525999  instructions retired
          3906278294  cycles elapsed
            52544016  peak memory footprint

Conclusion

So the max resident set size has decreased from 529MB to 70MB. Or a decrease of 87%.

@gharlan
Copy link
Copy Markdown
Owner

gharlan commented May 10, 2026

Thanks for the analysis and the fix, very nice! ❤️
I discussed this with Claude and we came up with another approach. With d98ceb9 we reduce the data before caching, so it decreases the cache database in addition to the used memory.
And in fda934d we introduced streaming to avoid needing to hold all raw data at the same time.

With both commits I get this for your test script:

        0,70 real         0,59 user         0,07 sys
            44974080  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                3904  page reclaims
                 215  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   9  messages sent
                  11  messages received
                   0  signals received
                 141  voluntary context switches
                 667  involuntary context switches
          7117522848  instructions retired
          1636651146  cycles elapsed
            15778392  peak memory footprint

(Before the refactoring the numbers were similar to your "before" numbers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants